144 research outputs found
Effective Genetic Risk Prediction Using Mixed Models
To date, efforts to produce high-quality polygenic risk scores from
genome-wide studies of common disease have focused on estimating and
aggregating the effects of multiple SNPs. Here we propose a novel statistical
approach for genetic risk prediction, based on random and mixed effects models.
Our approach (termed GeRSI) circumvents the need to estimate the effect sizes
of numerous SNPs by treating these effects as random, producing predictions
which are consistently superior to current state of the art, as we demonstrate
in extensive simulation. When applying GeRSI to seven phenotypes from the WTCCC
study, we confirm that the use of random effects is most beneficial for
diseases that are known to be highly polygenic: hypertension (HT) and bipolar
disorder (BD). For HT, there are no significant associations in the WTCCC data.
The best existing model yields an AUC of 54%, while GeRSI improves it to 59%.
For BD, using GeRSI improves the AUC from 55% to 62%. For individuals ranked at
the top 10% of BD risk predictions, using GeRSI substantially increases the BD
relative risk from 1.4 to 2.5.Comment: main text: 14 pages, 3 figures. Supplementary text: 16 pages, 21
figure
Piecewise linear regularized solution paths
We consider the generic regularized optimization problem
. Efron, Hastie,
Johnstone and Tibshirani [Ann. Statist. 32 (2004) 407--499] have shown that for
the LASSO--that is, if is squared error loss and is
the norm of --the optimal coefficient path is piecewise linear,
that is, is piecewise
constant. We derive a general characterization of the properties of (loss ,
penalty ) pairs which give piecewise linear coefficient paths. Such pairs
allow for efficient generation of the full regularized coefficient paths. We
investigate the nature of efficient path following algorithms which arise. We
use our results to suggest robust versions of the LASSO for regression and
classification, and to develop new, efficient algorithms for existing problems
in the literature, including Mammen and van de Geer's locally adaptive
regression splines.Comment: Published at http://dx.doi.org/10.1214/009053606000001370 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
From Fixed-X to Random-X Regression: Bias-Variance Decompositions, Covariance Penalties, and Prediction Error Estimation
In statistical prediction, classical approaches for model selection and model
evaluation based on covariance penalties are still widely used. Most of the
literature on this topic is based on what we call the "Fixed-X" assumption,
where covariate values are assumed to be nonrandom. By contrast, it is often
more reasonable to take a "Random-X" view, where the covariate values are
independently drawn for both training and prediction. To study the
applicability of covariance penalties in this setting, we propose a
decomposition of Random-X prediction error in which the randomness in the
covariates contributes to both the bias and variance components. This
decomposition is general, but we concentrate on the fundamental case of least
squares regression. We prove that in this setting the move from Fixed-X to
Random-X prediction results in an increase in both bias and variance. When the
covariates are normally distributed and the linear model is unbiased, all terms
in this decomposition are explicitly computable, which yields an extension of
Mallows' Cp that we call . also holds asymptotically for certain
classes of nonnormal covariates. When the noise variance is unknown, plugging
in the usual unbiased estimate leads to an approach that we call ,
which is closely related to Sp (Tukey 1967), and GCV (Craven and Wahba 1978).
For excess bias, we propose an estimate based on the "shortcut-formula" for
ordinary cross-validation (OCV), resulting in an approach we call .
Theoretical arguments and numerical simulations suggest that is
typically superior to OCV, though the difference is small. We further examine
the Random-X error of other popular estimators. The surprising result we get
for ridge regression is that, in the heavily-regularized regime, Random-X
variance is smaller than Fixed-X variance, which can lead to smaller overall
Random-X error
Excess Optimism: How Biased is the Apparent Error of an Estimator Tuned by SURE?
Nearly all estimators in statistical prediction come with an associated
tuning parameter, in one way or another. Common practice, given data, is to
choose the tuning parameter value that minimizes a constructed estimate of the
prediction error of the estimator; we focus on Stein's unbiased risk estimator,
or SURE (Stein, 1981; Efron, 1986) which forms an unbiased estimate of the
prediction error by augmenting the observed training error with an estimate of
the degrees of freedom of the estimator. Parameter tuning via SURE minimization
has been advocated by many authors, in a wide variety of problem settings, and
in general, it is natural to ask: what is the prediction error of the
SURE-tuned estimator? An obvious strategy would be simply use the apparent
error estimate as reported by SURE, i.e., the value of the SURE criterion at
its minimum, to estimate the prediction error of the SURE-tuned estimator. But
this is no longer unbiased; in fact, we would expect that the minimum of the
SURE criterion is systematically biased downwards for the true prediction
error. In this paper, we formally describe and study this bias.Comment: 39 pages, 3 figure
- …